-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RelayMiner]: add proxy.Ping(...)
capability to test connectivity between relay servers and backend URLs
#1037
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this back up @eddyzags! 🙌
I have to stop here for today but this is looking great so far! 🚀
The biggest thing I haven't reviewed yet is the test (but I already saw the addition of go-mockdns
, and I skimmed the test names 😉) and am looking forward to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change intentionally persisted, and if so, how is it related to this feature?
I think this change should be reverted. My assumption is that this is the result of an older commit which was never reconciled completely with main
:
- The yaml files referenced don't exist.
- The flags seem to be specifying the same/similar config as what's been removed from the relayminer configs that do exist. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't clear in my previous comments.
Was this change intentionally persisted, and if so, how is it related to this feature?
Yes, this change was intentionally made to ensure the Ping safeguard at startup succeeds for the Relayminer with the localnet default configuration, and/or any custom localnet configuration in that regard (link to localnet default configuration in the main branch). In the default localnet configuration, the Ollama Kubernetes deployment is not applied (ollama.enabled=false
). However, the relayminer configuration still referenced Ollama suppliers in its configuration files, even though the container wasn’t deployed (link to relayminer-1 configuration for localnet). With the newly introduced mechanism of the Ping safeguard at startup, this will cause the relayminer to fail continuously because the Ollama container isn't deployed.
To solve this issue, I found a way to dynamically define the relayminer's configuration based on the localnet configuration by modifying the poktrolld/Tiltfile
. Hence, those modifications.
For poktrolld
users that are deploying a Relayminer without relying on the localnet, they will have to make sure that their config.suppliers[*].service_config.backend_url
are up and running and reachable before deploying a Relayminer.
The yaml files referenced don't exist.
I disagree, they exists:
- values-common.yaml is defined here
- values-relayminer-common.yaml is defined here
- values-relayminer- + str(actor_number) + ".yaml" is defined here, here and here
The flags seem to be specifying the same/similar config as what's been removed from the relayminer configs that do exist.
I cannot find that. Can you link me to the precise line in my fork that makes you think that please? 🙏🏾
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyzags thanks for the detailed response here! 🙌
- The yaml files referenced don't exist.
I was referring to .yaml files referenced in this commit, but I also see that they're not referenced any more. I just didn't understand the rationale behind moving the config into the Tiltfile.
The flags seem to be specifying the same/similar config as what's been removed from the relayminer configs that do exist.
I was just pointing out that the config fields which you've removed from the relayminer configs correspond to the flags you've added in the Tiltfile. The point being, to question why should we prefer to provide the config via flags over the yaml file, which you answered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyzags This LGTM but #PUC in the code with your explanation related to Ping safeguard
.
You already have it written down anyhow :)
server.Handler = http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { | ||
sendJSONRPCResponse(test.t, w) | ||
}) | ||
listener, err := net.Listen("tcp", supplierConfig.ServiceConfig.BackendUrl.Host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why separate the listener from the server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using a custom listener, and thereby decoupling the listener from the serve action, we ensure that the HTTP server is fully prepared to listen on a specific port in the test's main Go routine. This guarantees that the HTTP server(s) is ready before proceeding to the actual test cases.
Previously, listening and serving were handled within the Go routine using http.ListenAndServe
function. This approach sometimes led to the HTTP server not being ready when the test cases began execution, resulting in test failures and flaky behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! 👍 #PUC with that explanation, perhaps condensed, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#PUC
Thanks for reviewing @bryanchriswhite ! Waiting for the rest of the review 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyzags thanks for the detailed response here! 🙌
- The yaml files referenced don't exist.
I was referring to .yaml files referenced in this commit, but I also see that they're not referenced any more. I just didn't understand the rationale behind moving the config into the Tiltfile.
The flags seem to be specifying the same/similar config as what's been removed from the relayminer configs that do exist.
I was just pointing out that the config fields which you've removed from the relayminer configs correspond to the flags you've added in the Tiltfile. The point being, to question why should we prefer to provide the config via flags over the yaml file, which you answered.
pkg/relayer/proxy/proxy_test.go
Outdated
relayProxyBehavior := append(t.relayerProxyBehavior, []func(*testproxy.TestBehavior){ | ||
testproxy.WithDefaultSupplier(t.supplierOperatorPingAllKeyName, supplierEndpoints), | ||
testproxy.WithServicesConfigMap(servicesConfigMap), | ||
}...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#PUC options provided via multiple calls to testproxy.WithDefaultSupplier(...)
and testproxy.WithServicesConfigMap(...)
are not mutually exclusive; the former accumulates service endpoints into a testutil global variable, and the latter starts an http server for each service in the given services config map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but I'm not sure I understand this. I aimed to define two relay servers with their own suppliers and services managed by the same relay proxy. Then call PingAll
to see if the mechanism is working across multiple relay servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored this part by using testproxy.WithDefaultSupplier(...)
& testproxy.WithServicesConfigMap(...)
one time only. eddyzags@598f85e
pkg/relayer/relayminer.go
Outdated
return nil | ||
} | ||
|
||
func (rel *relayMiner) newPinghandlerFn(ctx context.Context, ln net.Listener) http.HandlerFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the ln
variable is unused.
func (rel *relayMiner) newPinghandlerFn(ctx context.Context, ln net.Listener) http.HandlerFunc { | |
func (rel *relayMiner) newPinghandlerFn(ctx context.Context) http.HandlerFunc { |
pkg/relayer/relayminer.go
Outdated
// ping requests. A single ping request on the relay server broadcasts a | ||
// ping to all backing services/data nodes. | ||
go func() { | ||
if err := http.Serve(ln, rel.newPinghandlerFn(ctx, ln)); err != nil && !errors.Is(http.ErrServerClosed, err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := http.Serve(ln, rel.newPinghandlerFn(ctx, ln)); err != nil && !errors.Is(http.ErrServerClosed, err) { | |
if err := http.Serve(ln, rel.newPinghandlerFn(ctx)); err != nil && !errors.Is(http.ErrServerClosed, err) { |
pkg/relayer/relayminer_test.go
Outdated
@@ -57,3 +60,69 @@ func TestRelayMiner_StartAndStop(t *testing.T) { | |||
err = relayminer.Stop(ctx) | |||
require.NoError(t, err) | |||
} | |||
|
|||
func TestRelayMiner_Ping(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding one more test for the error cases where the relayer proxy mock's #PingAll()
returns the temporary and non-temporary *url.Error
s such that we can assert (and cover regression of) the resulting error returned from the HTTP GET on the ping endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Added here eddyzags@43838e7 (Refactor to test suite here: eddyzags@4c5bc4b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(cc @okdas)
…ty between relay servers and backend URLs (#1) * relayer: add RelayServers() method to RelayProxy interface; Add Ping(), ServiceIDs(), Forward() method to RelayServer interface; add RelayServers slice with helper method byServiceID * relayer: add forward config entry * relayer: implement ServiceIDs, Forward, and Ping method for synchrounous RPC server * relayer: add RelayServers implementation for RelayProxy * relayer: add Ping and Forward options * relayer: integrate ping option * relayer: add ServePing and ServeForward method to RelayMiner * test proxy.Ping() in test + remove forward feature * add serve ping test * add doc
…s based on localnet config
… avoid flaky tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyzags Wasn't my intention to have this hanging for so long, but I'm glad @bryanchriswhite and you had a good back & forth to get it here.
In terms of next step:
- Please see @bryanchriswhite's comments
- See my minor NITs
- Merge with the latest main
- @okdas will review the one TiltFile / k8s related comment
- Can you upload a video to the github PR description showing this
Especially as PGAT is getting kicked off (and we have some large beta users), I think everyone will love this!
### Localnet Helpers ### | ||
######################## | ||
|
||
.PHONY: localnet_relayminer1_ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move these into a ping.mk
and add an import at the obbtom of this makefile?
backend_url: http://rest:10000/ | ||
publicly_exposed_endpoints: | ||
- relayminer1 | ||
suppliers: [] # suppliers list is dynamically defined in poktroll/Tiltfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want a 👍 from @okdas in case this has downstream effects on our E2E testing in ephemeral DevNets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this breaking e2e on devnets.
Configures a `ping` server to test the connectivity of all backend URLs. If | ||
all the backend URLs are reachable, the endpoint returns a 204 HTTP | ||
Code. If one or more backend URLs aren't reachable, the service | ||
returns an appropriate HTTP error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configures a `ping` server to test the connectivity of all backend URLs. If | |
all the backend URLs are reachable, the endpoint returns a 204 HTTP | |
Code. If one or more backend URLs aren't reachable, the service | |
returns an appropriate HTTP error. | |
// ConfigurePingHandler sets up a health check server that: | |
// - Tests connectivity to all configured backend URLs | |
// - Returns HTTP 204 if all backends are reachable | |
// - Returns appropriate HTTP error if any backend is unreachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyzags This LGTM but #PUC in the code with your explanation related to Ping safeguard
.
You already have it written down anyhow :)
// NewMockOneTimeRelayerProxyWithPing creates a new mock RelayerProxy. This mock | ||
// RelayerProxy will expect a call to ServedRelays with the given context, and | ||
// when that call is made, returnedRelaysObs is returned. It also expects a call | ||
// to Start, Ping, and Stop with the given context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NewMockOneTimeRelayerProxyWithPing creates a new mock RelayerProxy. This mock | |
// RelayerProxy will expect a call to ServedRelays with the given context, and | |
// when that call is made, returnedRelaysObs is returned. It also expects a call | |
// to Start, Ping, and Stop with the given context. | |
// NewMockOneTimeRelayerProxyWithPing creates a new mock RelayerProxy that: | |
// - Expects a call to ServedRelays with the given context | |
// - Returns returnedRelaysObs when ServedRelays is called | |
// - Expects one call each to Start, Ping, and Stop with the given context |
}() | ||
|
||
go func() { | ||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#PUC when we expect this context to close.
I'm guessing (assumption, gut intuition) is when the process shuts down, but making it explicit would be nice.
// Start a long-lived goroutine that starts an HTTP server responding to | ||
// ping requests. A single ping request on the relay server broadcasts a | ||
// ping to all backing services/data nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Start a long-lived goroutine that starts an HTTP server responding to | |
// ping requests. A single ping request on the relay server broadcasts a | |
// ping to all backing services/data nodes. | |
// StartPingServer launches a goroutine that: | |
// - Creates a long-running HTTP server | |
// - Handles ping requests by broadcasting health checks to all backing services | |
// - Tests connectivity to all configured data nodes |
|
||
// RelayMinerPingConfig is the structure resulting from parsing the ping | ||
// server configuration. | ||
type RelayMinerPingConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type RelayMinerPingConfig struct { | |
// TODO_TECHDEBT(@red-0ne): Remove this structure altogether. See the discussion here for ref: | |
// https://github.com/pokt-network/poktroll/pull/1037/files#r1928599958 | |
type RelayMinerPingConfig struct { |
relayMinerConfig.Ping = &RelayMinerPingConfig{ | ||
Enabled: yamlRelayMinerConfig.Ping.Enabled, | ||
Addr: yamlRelayMinerConfig.Ping.Addr, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this change.
@@ -206,3 +214,22 @@ func (rp *relayerProxy) validateConfig() error { | |||
|
|||
return nil | |||
} | |||
|
|||
// PingAll tests the connectivity between all the managed relay servers and their respective backend URLs. | |||
func (rp *relayerProxy) PingAll(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only checked the Tiltfile and kubernetes yaml part - looks good to me. Thank you @eddyzags!
backend_url: http://rest:10000/ | ||
publicly_exposed_endpoints: | ||
- relayminer1 | ||
suppliers: [] # suppliers list is dynamically defined in poktroll/Tiltfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this breaking e2e on devnets.
Summary
This PR adds the capability to test the connectivity between the Relay Servers and the Backend URLs in two ways.
Safeguard at Startup:
For every
suppliers.[].service_config.backend_url
referenced as input inside the Relay Miner Configuration file, the Relay Proxy will verify wether the network connection between the targetedbackend_url
and the relayerminer process is functioning properly. If one or more connections aren't possible, the relay miner won't be able to start.Configurable Ping HTTP server:
The Relay Miner process will listen for incoming request to synchronously test the connectivity of every referenced
suppliers.[].service_config.backend_url
. If one or more backend URLs aren't reachable, the incoming request will fail.Based on the
serverConfig.ServerType
(Example: HTTP), each Server Type will implement their own logic to implement to test the connectivity.Issue
Type of change
Select one or more:
Testing
Documentation changes (only if making doc changes)
make docusaurus_start
; only needed if you make doc changesLocal Testing (only if making code changes)
make go_develop_and_test
make test_e2e
PR Testing (only if making code changes)
devnet-test-e2e
label to the PR.make trigger_ci
if you want to re-trigger tests without any code changesSanity Checklist
Summary by CodeRabbit
Summary by CodeRabbit
New Features
ping
functionality, allowing users to test backend connectivity within the relay miner's setup.Bug Fixes
Tests